Morpho-syntactic Clues for Terminological Processing in Serbian
نویسندگان
چکیده
In this paper we discuss morpho-syntactic clues that can be used to facilitate terminological processing in Serbian. A method (called SRCE) for automatic extraction of multiword terms is presented. The approach incorporates a set of generic morpho-syntactic filters for recognition of term candidates, a method for conflation of morphological variants and a module for foreign word recognition. Morpho-syntactic filters describe general term formation patterns, and are implemented as generic regular expressions. The inner structure together with the agreements within term candidates are used as clues to discover the boundaries of nested terms. The results of the terminological processing of a textbook corpus in the domains of mathematics and computer science are presented.
منابع مشابه
Augmenting a Small Parallel Text with Morpho-syntactic Language Resources for Serbian-English Statistical Machine Translation
In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A sm...
متن کاملInformations morpho-syntaxiques et adaptation thématique pour améliorer la reconnaissance de la parole
A way to improve outputs produced by automatic speech recognition (ASR) systems isto integrate additional linguistic knowledge. Our research in this eld focuses on two aspects:morpho-syntactic information and thematic adaptation.In the rst part, we propose a new mode of integration of parts of speech in a post-processingstage of speech decoding. To do this, we tag N-best sentenc...
متن کاملA Study on Morpho-Syntactic Patterns: A Cohesive Device in Some Persian Live Sport Radio and TV Talks
Morpho-syntactic patterns device encompasses a subcategory of the cohesive devices that assists hearers to have an adequate mental representation for understanding speech. This article investigates the morpho-syntactic patterns employed in some Persian live sport radio and TV programs adapting Dooley and Levinsohn’s theoretical and analytical framework. The research data includes around 30,000 ...
متن کاملAugmenting a Small Parallel Text with Morpho-Syntactic Language
In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A sm...
متن کاملMorpho-Syntactic Descriptions in MULTEXT-East - the Case of Serbian
Cvetana Krstev,∗ Duško Vitas† and Tomaž Erjavec‡ ∗Faculty of Philology, University of Belgrade Studentski trg 3, 11000 Begrade Serbia and Montenegro [email protected] †Faculty of Mathematics, University of Belgrade Studentski trg 16, 11000 Begrade Serbia and Montenegro [email protected] ‡Department of Knowledge Technologies Jožef Stefan Institute Jamova 39, 1000 Ljubljana Slovenia tomaz.e...
متن کامل